DOCOBEBT SESOflE 



SD 126 126 



IH 905 382 



iOTEOE 

THIS 
POB DATS 
SOTS 



Greer r Donald 3oss 

Seducing Bias in j^cbieTeaent ?ests. 

[Apr 763 

16p.; Paper presented ar the Annual fleeting o£ the 
American Sducational Eesearch Association (60th, San 
Francisco, California, April 19-23, 1976) 



ED2S P2ICE 
DESC2ZEI0B5 



a?-$0^33 HC-$1.67 Plus Postage. 

♦AchieTeaent Tests; *Ite« Analysis; llatheaatical 
Models; iSinority Groups; Baciai Discriaination; 
Standardized Tests; Statistical Analysis; *Test Bias; 
♦Test Construction; -Testing Protleas; *Test 
Validity 



aaSTEACT 

During the past few years the probles of bias in 
testing hss becoae an increasingly iaporrant issue. In aost research, 
bias refers to the fair use of tests and has thus been defined in 
teras of an outside criterion measure of the performance being 
predicted by the test. Secently however, there has been growing 
interest in assessing bias when such criteria are not available. In 
test construction in particular, where criterion Measures are usually 
net collected until after the test is completed, assessment of bias, 
in the absence of criteria has becowe a vital issue. If unbiased 
tests are to be built, it is important to identify potentially biased 
items during the construction process when test content is still 
flexible and items may still be modified or eliminated. Presented 
here are the author's research efforts over the past six years on 
bias in the construction of achievement tests. A general overview of 
the problem and some of the difficulties involved in studying it are 
also presented. (Authcr/DEP) 



♦ Documents acguired by E2IC include many informal unpublished * 

♦ materials not available from other sources. EBIC makes every effort * 

♦ to obtain the best copy available. Nevertheless, items of marginal * 

♦ reproducibility are often encountered and this affects the quality * 

♦ of the microfiche and hardcopy reproductions EPIC makes available 

♦ via the ESIC Document Heprodaction Service (ED3S) . EDE3 is not ♦ 

♦ responsible for the quality of the original document. Eeproductions * 

♦ supplied by EDES are the bsst that can be made from the original. * 



ERIC 



CM 



Keduciss Bias In Achievement Tests 



Donald loss Green 
CTB/HcCraif-Hill 



00 
CO 

iC 

Paper presented at 
Annual ocetinij of Ai&crican Educational Research Association 

San Francisco, CA 
April 20, 1976 

ERiC 2 



In this paper I shall revxev the sosevhat sporadic efforts I have sade 
in the last six years to deal with bias in the construction of achievement 
tests, fiocevtr^ before proceeding 1 vould like to sake three points. 

First, the topic of test bias usually arouses eaotions. To nose p;sople 
it is obvious that bias is "bad" and it is widely assuned that biased tests 
are built by bad people trying to do bad things to others. These assuaytions 
are sonetiaes accozspanied by the %^ev that all tests are bad especially if 
they are published. These who feel this way should be vamed that in this 
paper I vill sake exactly the opposite set of assunptions. I will assuae 
that tests, especially published tests, are good and that they are built by 
good people trying to do good things. Now I won't go so far as to say that 
biased tests are good, but they are not necessarily bad. 

I won't assuDe that biased tests are necessarily bad because test bias 
and unfairness are not the saae thing. That is, unfairness is a function of 
how a test is used while bias is a characteristic of the test. We nay say 
a test is biased vhcn it systenatically measures different things for one 
group than it ineasures for another. Note that if the biased test is used 
as though it is loeasuring the same thing, i.e., if it is used as though it 
was not biased, then it becones unfair. A biased test is likely to be used 
in an unfair way but it doesn't have to happen. To prevent it one would 
have to know that the test was biased and either not use It with the wrong 
group, or if oie could tell by how such, allow for or correct for the bias. 
If it is not known that a test is biased then it will presumably be used 
as though it is oeasuring the same thing for all groups and will probably 
lead to unfairness for one or aore of then. Thus the problem is to detect 
bias in a test. In the absence of an outside criterion the task is not to 
judge fairness but to ascertain if the test is measuring the same things 
for the various groups under consideration. 



-2- 



ERIC 



Xo tenerally accepted *et of procedures for dclng this existt. One 
say of course have experts froa irarlous groups esaiaic'* the test and try 
to judge the aatter. I believe aost test publishers do this. As I see 
It this Is a useful procedure but hardly definitive. More enpirical pro- 
cedures are needed. Therefore, for lack of better, nost of us here have 

turned to trying to detect bias In items rather than in test scores. It 

the tvo are 

■ay see« at first blush that / the same thing, but unfortunately they are 
not. The hope is that we can build unbiased tests by eliainating biased 
Items. Howevfer there are aa^r problem and we have wch to ieam as yet. 

Thus secondly I iwuld note that bias in achieveoent test construction 
Is aa laportant topic sadly neglected in research. It appears that untU 
Just a few years ago aast aeaiers of the neasurenent coisBunity did not con- 
sider it an i^rcant issue.. To date aptitude tests and selection have 
received the liorfs share of attention, but even there construction has been 
a neglected aspect. Yet achieveacnt testing is, I Relieve, the larger enter- 
prise by far and, as the potential detcrainer of the fate of our various 
efforts to inprove schooling, is aore laportant to boot. The relatively 
saall effort on the topic is Illustrated hy the fact - I believe it is 
accurate - that the work of the ncafcers fi£ this panel encoapass a substan- 
tial portion. If not the bulk, of the effort in this area. I aa of course 
Including the work of Coffnan wiiose studies established one of the aajor 
lines of attack on the problem soac years ago (Cof faan, 1961; Cardall & 
Coffaan, ]964) and that of Angoff who has continued and extended this 
«ork substantially in a series of studies. I should point out right ncrj, 
that Angoff wrote a fairly comprehensive review of the work on the.- 
toplc of this symposium for an NIE conference last December (Angoff, 1975). 
It was, as you would expect, a good paper and makes unnecessary any 
literature sunnary now. In fact, it would seem doubly silly to do so 
since our roles were reversed at that conference. 

4 



Hf third point is one vhich I Mde at that tiae (Green, 1975) . It 
Is that ve should xiot accept the absence of outside criteria. I have ccae 
to helieve that achicvexent tests should he validated against outside cri- 
teria. Listen to the critics of the current crop of standardized achieve- 
ment tests. They do not find the content and construct validity evidence 

(e.g., Levine, 1976), 
offered for the tests very cocqielling/and aany «rc becoming Increasingly 

convinced that the tests are badly biased. It is true that in a nuober of 

cases these beliefs appear to be based aore on distrust and suspicion than 

on evidence, but others have mde sone logical arguments for that viewpoinc 

(see for exax^le. Green, Nyquist & Griffore, 1975)- I believe that the 

potential aerit of the criticisn is strong enough so that only evidence froa 

external criteria will suffice to counter then. 

This does not sean I think we are wasting our time here. In the firs.t 

need 

place, even if everyone were to agree isnediately about this/ it will take 
a long tine to develop consensus about criterion neasures. Furthemore as 
a practical laatter during test construction one must usually function with- 
out an outside criterion and it is the test construction process we are dis- 
cussing. Ultimately the validity of a test has to be established in use 
but it does have to be built and it would be most valuable if we knew what 
internal characteristics of tests and items cended to make a test biased. 
It was with this sort of question in mind that I started working on the 
topic about six years ago. 

Most of this work on bias has been concerned with the item selection 
aspect of test development* To avoid misunderstanding let me describe where 
that fits in the test development process* After the rationale and speci- 
fications of the test are produced,- items are written to fit these specifi- 
cations. They are edited and assembled into tryout tests, which are then 
administered to a sample of the target population* From the results of 
this tryout nn item analysis is produced which becomes the basis for 



-4- 



selecting the Iteas for the final version of tTua test. 

Ite- selection involves, first, eli«lcatiag defective ite.fi since 

no Ktter how experienced the Itea vriters and how careful the editing, 

samt of these always appear. If necessary, i.e., if there are not enough 

substitutes, these ite=s are revised in hopes of i^roving then, but this 

is not desirable since then one is less certain about the difficulty level 

and discrijsination power of the resulting test unless another tryout is 

done. Next cones selecting the «Dst efficient and effective set of the 

reMining Items. Efficiency and effectiveness relate to difficulty and 

di«cri«nation. One would like to insure that the test contains a set of 

iteas with a suitable range of difficulties; that the ite» show growth 

over the period of tine the topic is taught, and that the resulting test 

will exhibit adequate reliability. Thus ite«; should discriminate and 

each should contribute to the reUability of its subtest. Other things 

being equal, itens with gpod ite. test correlations are the ones to choose. 

In short, it it highly desirable to have a choice of itc«; that one can 

use and still ' -t the rationale and content specifications for the test. 

At CTB the practice is to tryout anywhere froa 1 1/2 to 3 tines as 

«ny iters as are ultimately needed. A 2 to 1 ratio is typical. The higher 

ratio is used for those areas and ite. types about which relatively little 

to deal with 

is known or which have been found dif ficultf in the past. Thus ordinarily 
there are several it^ to choose from for each content category. 

When we first started thinking about test bias it occurred to us 
' that this selection step might be accentuating the bias. The argument 

goes like this: there may be characteristics of the tryout sample which 
influence their ways of responding to the items in addition to those the 
test is intended to measure. Such things as general backgrc^d knowledge, 
language styles :,nd dialects, cultural values and motivations are likely 
c«.didatcs. Items responsive to these characteristics will tend to look 



-5- 



llke r.ol lt«n. »d Ml t«uS t. b. selected. Since tr,»ut xroup. us^lljr -re 
^Ij represent the »3.ritr ttan ^«rity iB .h«e tMngs (1... . 

unties are In the Inorlt.:). -Jorit, group ch.racterlstlcs -Ul deter- 
^ selection. So In first study tM ,uestl». -as Wd one choose 

the sa-e lte« If one h«l a -l^-rltj tryout group Instead of a "standard", 
l.e.. representative group. Using the standardization data for CAT-70 I 
.et up various tryout groups differing by race. SES, and region of the 
„«.try. using the point hiserlal as the basis for selection, the "best" 
- half of the Ite« in the several tests of the 1970 Calif omi a AcMe.««nt . 

Te«. -were chosen for each of the various ethnic and regional groups stalled. The 
purpose vas to see If the sa,^ lte« vould be selects. If Inorlty groups 
„re used for the ite. txyouts. the differences In choic«= varied fro. 
20X-50X of the itens chos«.. Therefore it appears that tests built fro. 
.inoritv group tryouts yould differ fro. the tests created using the usual 
..rt of tryout sa.ple. Kot only .ould t he particular lte.s chosen be dlffer- 
„t. but probably so «.uld .hat the t^t 1. assuring be different even though 
aU the ite^ in the pool .ere «Itten to the sa« specifications. This fol- 
10.S fro. the finding thar the Intercorrelatlons a«ng the .onco-»n parts 
,£ the various ite. sets th„s created (all data c,« fro. the standardlratlon 
.a.ple .ho took all the lte») suggest that in »ny Instances these Ite« 
^1, «asure different things for different groups. Ho-ever, «ny of these 
"tests" .ere perforce very short and not Very relUble, so the generality 
of this conclusion is open to question. 

Having concluded that using a separate -dnority tryout group .ould 
change the test, it seemed reasonable to try having both a black and a 
"standard" (i.e., representative) tryout group for For. S of the CTBS .hich 
.e .ere then constructing. We did this and .e have learned «ny tMngs not 
the least of .hid, is that the use of ^Itlplc tryout groups is not only 
ERIC expensive, but procedurally «.=t a.I»nrd I should note that the editorial 



policy vc set up vas that the black tryout data would be useJ as a screening 
device to detect itcas »-rkedly bad for blacks. «e e-phasized lev point 
blserials as the screening criterion, but unlike the procedure in the first 
study, all the itea data vere available for use. Actually the procedure 
varied videly fro= level to level and test to test both because content 
validity considerations vere supposed to prevail vhen there was a conflict 
and because each of tl,e »any different editors seened to be able to find 
ways of Baking unique interpretations of the data they were given. I can 
say with certainty only that all of then looked at the ite- analysis data 
for both groups and tried to produce the best test they could. 

In any case, whatever the practice, the resulting sets of point biserl- 
ais vere affected. The kind of effect is illustrated by Table 1 which shows 

frequency distributions of point biserials for the reading comprehension 
rreque y (CTB/McGraw-Hill, 1974). 

test of Level 3 of CTBS/S/ As the table illustrates, it appears frc« the 
tryout data that in nost subtests at «,st levels (so-ething over 60 separate 
tests are involved) the items finally chosen had higher point biserials for 
blacks relative to those for the standard groups than would have been expected 
If the black data had been ignored. From the standardization data is appears 
that there are fewer items with really low point biserials for blacks (i.e., 
under 0.20) than otherwise might have been expected. Whether the resulting 
tests are in fact better for blacks than might have been expected remains to 
be seen. The only other thing I can say at this point is that based on 
standardization data, the number of very low point biserials is generally 
• higher for the Spanish speaking group than for blacks. Since there were no 
data for Spanish background groups obtained in the tryout, perhaps that result 
«ans something. I hope In due course to be able to make stronger statements. 

Please note that while offering these data a, weak indications that our 
te.t construction activities did reduce bias against blacks a little, I do 
ERIC not claim that such data mean anything very positive about the validity of 



-7- 



ERIC 



the tests. However re«^er we are talkies about choosing Ite-s froe a-ong 
those that fit the content require-ents and thus the eli-inacion of ite-s 
«,t related to total score a«>ng blacks should ^an «,re adequate 

■easureseat for that group. 

StiU it is aU based on the assu-ption that overal the test is valid, 
which is the assui-ption of goodness X -entioned at the beginning. In his 
review Angoff describes any such procedure as a bootstrap operation 
because of this assu^ion. All of the procedures being considered in this 
sy^siu- are directed at ite. bias rather than test hi^ and -ost of 
the. find this assu-ption necessary. Perhaps, given that content validity 
is given first consideration, the proposition is reasonable. But if ±i is 
«,t we are in trouble. However ve are in trouble anyway because the various 
p„cedures have problems which tend to le,d to conflict with this assumption. 

A basic one is that -unority groups usually score lower than the 
«Jority. On the one hand, given content validity, this »ay merely .ean the 

... J 1 «o thP averaee. On the other hand, these lower 
low scoring group has achieved less on the average, un 

scores are the starting point of the suspicion of bias, and consequently one 
cannot accept the assumption until it is demonstrated valid. Without some 
way to talk about the relationship between item bias and test bias it seems 
that we are going around a very nearly circular path. 

It became apparent to us some time ago that we needed a way to talk 
about the amount of bias in a test and to relate procedures for identifying 
biased items to that. In a 1972 APA paper with Jobn Draper I had proposed 
thinking of test bias as consisting of those factors in scores unique or 
specific to particular groups. John Draper had been exploring ways of using 
factor analysis to identify these "group specific" factors. While not entirely 
•uccessful he suggested that the amount of "group specific" variance in con- 
O trast to the variance common to the groups could be considered an index of ^ 

aaount of bias. He further set up a aodel of this and proposed we do a 



slaulatlon of various item selection procedures to see if they did affect 
the proportions of group specific variant or bias* At this point John 
departed for the greener grass of SRI. Fortunately for me my colleague 
Wendy Yen stepped in, finishe-I developing the model and worked out all the 
procedures. Although she has been my guide through the printout piles ^ she 
is not responsible for ray conclusions^ However you should assume that all 
the good ideas are hers. 

It took a while to get started but we are now into the endless games 
that simulations lead to. So far we have only looked at point biserxals and even 
that is not all done. Still I would like to discuss briefly what I think 
we have found so far. 

Our model consists of 10 items and two groups with three factors common 
to the two groups and 10 factors having variance in only one or the other 
group, i.e., five factors specific to each group. Groups 1 and 2 were assumed 
to have 170 and 670 members respectively. Obviously Group 1 was meant to 
represent the minority group. 

Two sets of difficulties for the 10 items were arbitrarily chosen as 
were the loadings of these items on the common factors. An eqiial amount 
of error variance for each variable in each group was assumed; errors were 
assumed independent of each other and of other factors. These are displayed 
in Table 2. Also Dostulated were the several sets of covariances shown 
in Table 3. From these data the common variance, which was used throughout, 
was determined. Next, starting with zero loadings, the group specific 
loading was randomly incremented in Group 1. Then, not altering Group 1 
further, thirty iterations (increments in group specific loadings) were made 
for Group 2. For each of these thirty iterations, item point biserials were 
calculated. 

10 

ERIC 



-8a- 



*I must confess to near total ignorance of factor analysis. I did take 
« course once from Karl holzinger but it had no effect. I would like to 
blame Kirl but Henry Kaiser took the same course at the same time and as 
far as I know it is the only such course Henry has taken also. 



11 

ERIC 



At the end of these thirty iterations, all the group specific loadings 
in Group 2 were set back at zero. A second randomly chosen group specific 
loading was incremented in Group 1* Again , not altering Group 1 further, 
thirty iterations were performed on Group 2, In this way a total of 30 x 30 
combinations of different amounts of group specific variance in Groups 1 
and 2 were examined • 

Using the point biserials calculated for each iteration, the "best" five 
item tests were chosen. As a lirst stab we proceeded in two ways. One way 
was to select those items whose point biserials for the two groups differed 
least, the other was to rank the point biserials within each group and choose 
those with highest average rank. As might be expected these two selection 
. procedures produced very different sets of "best" Items. 

To determine the effect on test bias the ratio of group specific vari- 
ance to the total variance for the 10 items in each pair of iterations 
was determined and compared to the same ratio for the five items selected. 
If the latter figure is smaller one can say that the selection reduced the 
amount of test bias. The outcome of that comparison can be seen in Table 4. 
The proportion of the group specific variance in the selected item set does 
seem to be reduced when the amount of that vari.mce is small to begin with. 
As it gets larger the selection procedure appears to become less effective. 
When the bulk of the variance in Group 1 is group specific this selection 
procedure tends to increase it still further. 

The second selection procedure appears to be ineffective in reducing 
bias when the amount overall is low but quite effective in increasing it when 
group specific variance is the majority. 

Ordinarily Groups 1 and 2 would not be separated and the item statistics 
would be calculated for the total group. Therefore after each iteration 



12 



leferences 



Aogoff, W. H. The Investigation of test bias In the absence of an outside 

criterion . Paper presented at the NIE Conference on Test Bias, Washington » 

D.C., Deccnber, 1975. 
Cardall, C, 4 Coffsan, E. A method for cor^arlng the perfomacce of 

different groups on the Iteas In a test (ETS M 64-61). Princeton, NJ: 

Educational Testing Service, 1964. 
Coffaan, W. E. Sex differences In responses to Itens In an aptitude test. 

Eighteenth Yearbook of The National Council on Measurement in Education, 

1961, pp. 117-124. 

CTB/McGraw-Hlll. Comprehensive tests of basic skills, for* S: Technical 

bulletin no . 1.. Monterey, CA: Author, 1974. 
Green, Racial and ethnic bias in test construction * Monterey, CA: 

CTB/McGraw-Hill, 1971. 
Green, D. R* Procedures for assessing bias in achlevggent tests. Paper 

presented at the NIE Conference on Test Bias, Washington, D.C., December, 

1975. 

Green, R. L., Nyqulst, J. G., & Griffore, R. J. Standardized achicveicent 
testing : Some isapli cations for the lives of children . Paper presented 
at the NIE Conference on Test Bias, Washington, D.C., December, 1975. 

Levlne, M. The acadenic achievement test. American Psychologist , 1976, 
31, 228-238. 



13 



Tabic 1. Frequency Distributions of Point Slserials 

for the Tryout and Standardization by EUinic 
Croup for Reading CciqiFcfacnsion, CItS/S, 
Level 3. 



TRYOUT 



?t. Bis- 



Itess Rejected 
Standard Black 



I tens Accepted 
Standard Black 



STA1:D.AKDIZATI0!J 



Standard- Black Spanish 



0th 



.800-. «99 
.700-. 799 
.600-. 699 
.500-.599 

409-.499 
.30C-.3S9 
.200-. 299 
.100-.199 

.000-7099 
.001-7099 
.100-7199 
.20t>-729 9 

Median 



4 

11 
14 
9 

2 
2 



1 
9 
6 
11 
8 
7 



2 
7 
14 
16 
4 
1 



4 
3 

18 

12 
9 
3 



2 
U 
17 

12 . 
2 



5 
18 
15 
3 
4 



7 
15 
16 
5 
2 



2 
11 
Ifl 

11 



.357 



.253 



.417 



.391 



.461 



.401 



.396 



.46i 



ERIC 



14 



Table 2. Siaula^ion i'ioiei 



MODEL 3 30 ITERATIONS 

MITIAL ?45l4MET£fiS 

NUHRF:^ of 03SEitVEn VA«IA»LeS = lO 

OF Cr'J*'HiOM FAC^PSS = 3 

NUMS^X OF SPECIFIC FACTORS FOR CROUO OWE « $ 

NIW3ER OF SPFCIFIC FACTORS FOR GROUP TWO » * 5 

NUMBER or ITERATIC^S FOR RUN = 30 

VALUE OF IMCPEMEMT IH CROUP SPECIFIC LOADING = 0.240 



ITEM DIFFICULTIES FC* THE CROUPS - 

1 0.480 0.200 0.290 0.390 0.450 

2 0.780 0.700 0.520 0.610 0.710 
Z SCOPES FOR THE GROUPS - 

1 -.050 -.843 -.555 -.275 -.125 

2 0.773 0.525 0.050 0.275 0.553 



0.200 0.450 0.360 0.520 0.500 

0.250 0.350 0.590 0.720 0.710 

-.843 -.125 -.355 0.050 0.0 

-.675 1.035 0.225 0.580 0.553 



LOADINGS 


FOR CGMMCSN 


FACTOPS 




.__ 1 


0.496 


O.OOl 


0.135 


2 


0.069 


0.046 


0.008 


3 


-0.041 


0.041 


0.115 


4 - 


0.051 


0.404 


-0.166 


5 


0.020 


-0.045 


0.109 


6 


0.213 


O.lOl 


0.119 


7 


0.235 


0.067 


0.072 


e 


0.411 


-0.027 


-0.008 


9 


0.010 


0.067 


0.533 


10 


0.228 


0.157 


-0.074 



COVMIANCE OF E«ROR 



1 0.120 

2 0.0 

3 0.0 

4 0.0 

5 0.0 

6 0.0 

7 0*0 
S 0.0 
f 0.0 

10 0.0 



ALL CROUPS 



o.c 


0.0 


0.0 


0.120 


0.0 


0.0 


C.O 


0.120 


0.0 


0.0 


0.0 


0.120 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 



0.0 


o.c 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.120 


0.0 


0.0 


CO 


0.120 


0.0 


0.0 


0.0 


0.120 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 



0.0 


0.0 


o.c 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


o.c 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.120 


0.0 


0.0 


0.0 


0.120 


0.0 


0.0 


0.0 


0.120 



o 

ERIC 



15 



Table 3- Model Covaria»ices 



COVA'-.IA-NCE OF COMMaM FACTORS 

1 1.000 C.437 

2 0.474 1.090 0.505 

3 0.437 0.505 1.000 



COVA«IANCf=S OF COMMON * GROUP SPECIFIC FACTORS FOR GROUP I 

1 0.140 0.145 0.096 0.103 0.122 

2 0.135 0.146 0.114 0.092 0.138 

3 0.145 0.179 O.IOC 0.090 0.148 



COVARIANCES OF COMMON ♦ G?-OUP SPECIFIC FACTORS FOR GROUP 2 

1 0.139 0.143 0.112 0.15S 0.137 

2 0.144 0.129 0.136 0.137 0.149 

3 0.159 0.153 0.149 0.148 0.134 



COVARIANCE OF G^OUP 
1- I. 000 

2 0.548 

3 0.305 

4 0.358 

5 0.543 



SPECIFIC FACTORS 
0.54-3 0.305 
1.000 0.373 
0.373 I. 000 
0.395 0.225 
0.595 0.324 



FOR GROUP I 
0.358 0.543 
0.395 0.595 
0.225 0.324 
1.000 0.384 
0.384 1.000 



COVARIANCE OF GROUP SPECIFIC FACTORS GROUP 2 

1 1.000 0.515 0.A66 0.503 0.499 

2- 0.515 I. 000 0.397 0.493 0.524 

3 0.466 0.387 I. 000 0.417 0.417 

4 0.503 0.493 0.417 1.000 0.522 

5 0.499 0.524 0.417 0.522 l-OOO 



ERIC 



Table ^. Effect of t«o iten selecticn procedures on tte relatiYe 
axoani of grxnip specific rariance in the test. 



Selection procedure: Saallest differences between point biserials 



Croup 3 
Iteration 


i Croup Specific 
Variance in Croup 1 


Kuaber of cuoes in 30 the proportion 
of group specific variance vasl 
increased unchanged decreased 


1 


1 


0 


5 


25 


5 


8 


1 


6 


23 


10 


20 


3 


7 


20 


15 


33 


2 


9 


19 


20 


43 


5 


15 


10 


25 


51 


6 


12 


12 


30 


58 


13 


6 


11 



Selection procedure: Highest average rank of point biserials 



1 

5 
10 
15 
20 
25 
30 



1 
8 
20 
33 
43 
51 
58 



7 
6 
4 
6 
1 
£4 
2S 



20 
12 
22 
24 
28 
6 
2 



3 
12 
4 
0 
1 
0 
0 



17 



Table 3. Effect of it«« selection ?rcced.ires using two ^rocps In 
ccffltrast to « selection procedure osinp pooled data. 



Selection procedure: Saaliest differences between point biserials 



Croup 1 
I ceration 


7, Croup Specific j 
Variance in Croup 1 


Kuabcr of tines in 30 the proportion 
of group spcciiic variance vas 
increased unchanged decreased 


1 


1 


0 


5 


25 


S 


8 




5 


23 


10 


20 


3 


10 


17 


15 


33 


2 


14 


14 


20 


43 


5 


13 


12 


25 


51 


^ 


1? 


16 


30 


58 


0 


20 


10 



Selection procedure: Highest average rank of point biserials 



1 
5 
10 

20 
25 
30 



1 
8 

20 

43 
51 

58 



G - 


22 


8 


0 


15 


15 


6 


21 


3 


14 


16 


0 


5 


23 


2 


15 


15 


0 


21 


9 


0 



18 



